The Web Archive of the Internet Archive started in late 1996, is made available through the Wayback Machine , and some collections are available in bulk to researchers. Many pages are archived by the Internet Archive for other contributors including partners of Archive-IT , and Save Page Now users. Other captures are donated to the Internet Archive by other partners such as Alexa Internet .
Topic: Web Archive
The Internet Archive discovers and captures web pages through many different web crawls. At any given time several distinct crawls are running, some for months, and some every day or longer. View the web archive through the Wayback Machine .
Topic: webwidecrawl
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history. History is littered with hundreds of conflicts over the future of a community, group, location or...
Archive-It is a subscription web archiving service of the Internet Archive that helps organizations harvest, build, and preserve collections of digital content. Partners create domain specific collections of web captures that can be searched on Archive It . Content is hosted and stored at the Internet Archive data centers. Archive-It works with more than 400 partner organizations in 48 U.S. states and 16 countries worldwide including: College and University Libraries State Archives, Libraries,...
Topic: Colleges, Universities, Libraries, Archives, NGOs, Museums
Archive-It is the leading web archiving service for collecting and accessing cultural heritage on the web and is a service of Internet Archive used by libraries, archives, governments, non-profits, and other organizations to build collections of web materials.
Topic: TK
Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
Topics: web crawl, Alexa
This library contains digital images uploaded by Archive users which range from maps to astronomical imagery to photographs of artwork. Many of these images are available for free download.
Topic: images
6.3B
6.3B
Nov 4, 2011
11/11
by
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Topic: webcrawl
1.6B
1.6B
Nov 7, 2020
11/20
by
Archive Team
Download or listen to free music and audio This library contains recordings ranging from alternative news programming, to Grateful Dead concerts, to Old Time Radio shows, to book and poetry readings, to original music uploaded by our users. Many of these audios and MP3s are available for free download. Check our FAQ for more information . Contribute Your Audio Please feel free to upload your audio (Uploaders, please set a Creative Commons license as part of the upload process, so people know...
Topic: Audio
Download or listen to free movies, films, and videos This library contains digital movies uploaded by Archive users which range from classic full-length films, to daily alternative news broadcasts, to cartoons and concerts. Many of these videos are available for free download. Check our FAQ for more information . Contribute Your Movies and Video Please feel free to upload your movies (Uploaders, please set a Creative Commons license as part of the upload process, so people know what they can do...
Topic: Moving Images
Web crawl data from Common Crawl.
These crawls are part of an effort to archive pages as they are created and archive the pages that they refer to. That way, as the pages that are referenced are changed or taken from the web, a link to the version that was live when the page was written will be preserved. Then the Internet Archive hopes that references to these archived pages will be put in place of a link that would be otherwise be broken, or a companion link to allow people to see what was originally intended by a page's...
Collections of Wiki data
Topics: crawls, data, wiki
Crawl of outlinks from wikipedia.org . These files are currently not publicly accessible. from Wikipedia : Wikipedia is a multilingual, web-based, free-content encyclopedia project operated by the Wikimedia Foundation and based on an openly editable model. The name "Wikipedia" is a portmanteau of the words wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning "quick") and encyclopedia. Wikipedia's articles provide links to guide the...
9.2B
9.2B
Dec 16, 2004
12/04
by
Internet Archive
The Internet Archive offers over 20,000,000 freely downloadable books and texts. There is also a collection of 2.3 million modern eBooks that may be borrowed by anyone with a free archive.org account. Borrow a Book Books on Internet Archive are offered in many formats, including DAISY files intended for print disabled people. In addition to the collections here, print disabled people may access a large collection of modern books provided as encrypted DAISY files on...
Topics: Texts, Kindle, Ebook, Nook, Books, Documents
ArchiveBot is an IRC bot designed to automate the archival of smaller websites (e.g. up to a few hundred thousand URLs). You give it a URL to start at, and it grabs all content under that URL, records it in a WARC, and then uploads that WARC to ArchiveTeam servers for eventual injection into the Internet Archive (or other archive sites). To use ArchiveBot, drop by #archivebot on EFNet. To interact with ArchiveBot, you issue commands by typing it into the channel. Note you will need channel...
Topics: archiveteam, archivebot, webcrawl, robot, love
2.4B
2.4B
Apr 8, 2011
04/11
by
Internet Archive
Large-scale web harvests and national domain crawls performed for National Libraries, National Archives, preservation partners, research initiatives, and as part of special projects and custom crawling and research services.
Topic: ccs
This is a Collection of URLs (and Outlinked URLs) extracted from a random feed of 1% of all Tweets.
The Internet Archive Software Collection is the largest vintage and historical software library in the world, providing instant access to millions of programs, CD-ROM images, documentation and multimedia. The collection includes a broad range of software related materials including shareware, freeware, video news releases about software titles, speed runs of actual software game play, previews and promos for software games, high-score and skill replays of various game genres, and the art of...
305.4M
305M
Oct 14, 2016
10/16
by
Archive Team
Google has been planning to shut down panoramic photo sharing site Panoramio since September 2014. The initial plan was to merge it with Google Views which was a similar product. However, due to feedback from the Panoramio community they held off that move. Frank did an in depth post about this in June 2015. Since then Google Views itself was merged into Street View. Google has now announced that they are finally shutting down Panoramio for good. As of November 4th, 2016, they will stop...
A longitudinal web archival collection based on URIs from the daily feed of Media Cloud that maps news media coverage of current events.
miscellaneous data
Topic: brad tofel
Listen to free audio books and poetry recordings! This library of audio books and poetry features digital recordings and MP3's from the Naropa Poetics Audio Archive, LibriVox, Project Gutenberg, Maria Lectrix, and Internet Archive users.
LibriVox - founded in 2005 - is a community of volunteers from all over the world who record public domain texts: poetry, short stories, whole books, even dramatic works, in many different languages. All LibriVox recordings are in the public domain in the USA and available as free downloads on the internet. If you are not in the USA, please check your country's copyright law before downloading. Please visit the LibriVox website where you can search for books that interest you. You can search or...
Additional collections of scanned books, articles, and other texts (usually organized by topic) are presented here.
Mishary Rasyid per Juz. Memudahkan anda untuk ditemani murojaahan atau membaca al-Qur'an per hari 1 juz atau one day one juz (ODOJ)
favoritefavoritefavoritefavoritefavorite ( 9 reviews )
Topics: Murottal, al-Qur'an, Mishary, Rasyid, al-Afasy, Juz, full, Quranic
863.7M
864M
Jan 21, 2016
01/16
by
Archive Team
Archive Team now searches many, many news sites, including extensive worldwide and obscure sources, to capture unique news stories for history.
A number of religious and spiritual organizations regularly upload their sermons and lectures to the Archive through the Open Source Audio collection. You may easily locate them here.
The American Libraries collection includes material contributed from across the United States. Institutions range from the Library of Congress to many local public libraries. As a whole, this collection of material brings holdings that cover many facets of American life and scholarship into the public domain. Significant portions of this collection have been generously sponsored by Microsoft , Yahoo! , The Sloan Foundation , and others.
A collection of applications and programs for smartphones, including Android, Apple and... well, the others.
Folksonomy : A system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content; this practice is also known as collaborative tagging, social classification, social indexing, and social tagging. Coined by Thomas Vander Wal, it is a portmanteau of folk and taxonomy. Folksoundomy : A collection of sounds, music and speech derived from the efforts of volunteers to make information as widely available as possible. Because...
Kodi (formerly XBMC) is a free and open-source media player software application developed by the XBMC Foundation, a non-profit technology consortium. Kodi is available for multiple operating systems and hardware platforms, with a software 10-foot user interface for use with televisions and remote controls. It allows users to play and view most streaming media, such as videos, music, podcasts, and videos from the Internet, as well as all common digital media files from local and network storage...
Free books for the people with disabilities that impact reading. If you have a disability that interferes with reading printed text then all of these books can be instantaneously available in your browser or via protected download. Want access? Individuals If you would like to apply for access (it is free), make sure you have an Archive.org account and then fill in this form to contact the Vermont Mutual Aid Society . If you are affiliated with any of...
Topics: print disabled, print disability
150.3M
150M
Dec 19, 2017
12/17
by
Internet Archive Web Group
A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications. This collection contains WARC and CDX files that end up in Wayback ( https://web.archive.org ). See also bibliographic metadata corpuses at https://archive.org/details/ia_biblio_metadata
Books contributed by the Internet Archive.
Topic: internet archive books
This collection includes web crawls of the Federal Executive, Legislative, and Judicial branches of government performed at the end of US presidential terms of office.
Topics: web, end of term, US, federal government
Books in this collection may be borrowed by logged in patrons. You may read the books online in your browser or, in some cases, download them into Adobe Digital Editions , a free piece of software used for managing loans. Please note that works in this collection are protected by copyright law (Title 17 U.S. Code) and copying, redistribution or sale, whether or not for profit, by the recipient is not permitted unless authorized by the rightsholder or by law. See FAQs about...
Listen to sermons and lectures concerning religion and spirituality here.
201.1M
201M
Nov 12, 2013
11/13
by
ximm@archive.org
Miscellaneous high-value news sites
Topics: World news, US news, news
23.3M
23M
Dec 7, 2011
12/11
by
Wikimedia projects editors
The Wikimedia Foundation, Inc. is the non-profit parent organization of various free-content projects, most notably Wikipedia, the award-winning online encyclopedia. Here, you can find items related to the Wikimedia Foundation, which mostly are available from the Wikimedia downloads website. What is available? Wikimedia projects' database dump files Incremental add/change dumps Incremental media tarballs Static HTML dumps of Wikipedia that were generated years ago, and no longer running now....
Topic: wikipedia, wikimedia, dumps, downloads
Data crawled on behalf of Internet Memory Foundation . This data is currently not publicly accessible. from Wikipedia : The Internet Memory Foundation (formerly the European Archive Foundation) is a non profit foundation whose purpose is archiving web content, it supports projects and research which include the preservation and protection of multimedia content. Its archives form a digital library of cultural content.
WARCs from internal crawl testing.
Topics: web, cctld
Sermons, Lessons and Teachings, as well as supplemental and related materials.
Audiobooks in Vietnamese language.
A great resource for podcasters: the Creative Commons Podcasting Legal Guide .
Watch full-length feature films, classic shorts, world culture documentaries, World War II propaganda, movie trailers, and films created in just ten hours: These options are all featured in this diverse library! Many of these videos are available for free download.
190.4M
190M
Apr 23, 2019
04/19
by
Public Resource
This library of books, audio, video, and other materials from and about India is curated and maintained by Public Resource. The purpose of this library is to assist the students and the lifelong learners of India in their pursuit of an education so that they may better their status and their opportunities and to secure for themselves and for others justice, social, economic and political. This library has been posted for non-commercial purposes only and facilitates fair dealing usage of...
Uploads from the general users of ARCHIVE.ORG related to Islamic culture, studies and related subjects. From the Wikipedia entry for Islamic Studies: Islamic studies refers to the academic study of Islam. Islamic studies can be seen under at least two perspectives: From a secular perspective, Islamic studies is a field of academic research whose subject is Islam as religion and civilization. From a traditional Islamic perspective, Islamic studies is an umbrella term for the "religious...
The Vintage Software collection gathers various efforts by groups to classify, preserve, and provide historical software. These older programs, many of them running on defunct and rare hardware, are provided for purposes of study, education, and historical reference.
This collection features audio collections reflecting music, art and culture. Collections include the unique contemporary compositions and performances found in the Other Minds collection, the hundreds of popular songs from the early 20th Century found in the 78 RPM collection and oral history projects.
120.9M
121M
Nov 15, 2013
11/13
by
Internet Archive
Programs in TV News Archive for research and educational purposes. The programs allow users to search across a collection of television news programs dating back to 2009 for research and educational purposes such as fact checking. Users may view short clips, share links to customized short quotes, embed customized short quotes, or borrow a copy of the full program.
( 1 reviews )
203.5M
204M
Nov 29, 2016
11/16
by
Public Resource
This library of books, audio, video, and other materials from and about India is curated and maintained by Public Resource. The purpose of this library is to assist the students and the lifelong learners of India in their pursuit of an education so that they may better their status and their opportunities and to secure for themselves and for others justice, social, economic and political. This library has been posted for non-commercial purposes only and facilitates fair dealing usage of...
70.2M
70M
Oct 19, 2018
10/18
by
Various
As older software falls out of accessibility, various groups and individuals have created large compilations of wide ranges of titles and works, resulting in often-very-large compilations that are then accessible in bulk. Some are well-maintained catalogs while others are simply mega-size archive files. This collection compiles the compilations into one place.
Items included in the Television News search service. Part of TV News Archive .
281.2M
281M
Feb 26, 2005
02/05
by
Internet Archive
Feature films, shorts , silent films and trailers are available for viewing and downloading. Enjoy! View a list of all the Feature Films sorted by popularity . Do you want to post a feature film? First, figure out if it's in the Public Domain. Read this FAQ about determining if something is PD. If you're still not sure, post a question to the forum below with as much information about the movie as possible. One of our users might have relevant information.
Topic: Moving Images
Collection of texts by language.
This collection contains web crawls performed on the US Federal Executive, Legislative & Judicial branches of government in 2020-2021. Information about this project can be found here: https://end-of-term.github.io/eotarchive/ You can submit URLs to be archived here: https://digital2.library.unt.edu/nomination/eth2020/add/
Hacker News Crawl of their links.